System and method for compressing portions of a media signal using different codecs

ABSTRACT

An input module obtains a media signal to be communicated to a destination system, after which an identification module identifies a plurality of scenes within the media signal. A selection module automatically selects different codecs from a codec library to respectively compress at least two of the scenes. The codecs are automatically selected to produce a highest compression quality for the respective scenes according to a set of criteria without exceeding a target data rate. A compression module then compresses the scenes using the automatically selected codecs, after which an output module delivers the compressed scenes to the destination system with an indication of which codec was used to compress each scene.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.10/692,106, filed Oct. 23, 2003, which is a continuation-in-part of U.S.patent application Ser. No. 10/256,866, filed Sep. 26, 2002, whichclaims the benefit of Provisional Application No. 60/325,483, filed Sep.26, 2001. All of the foregoing applications are incorporated herein byreference.

TECHNICAL FIELD

The present invention relates generally to the field of datacompression. More specifically, the present invention relates totechniques for optimizing the compression of video and audio signals.

BACKGROUND OF THE INVENTION

In the communication age, bandwidth is money. Video and audio signals(hereinafter “media signals”) consume enormous amounts of bandwidthdepending on the desired transmission quality. As a result, datacompression is playing an increasingly important role in communication.

Conventionally, the parties to a communication decide on a particularcodec (compressor/decompressor) for compressing and decompressing mediasignals. A wide variety of codecs are available. General classificationsof codecs include discrete cosine transfer (DCT) or “block” codecs,fractal codecs, and wavelet codecs.

Some codecs are “lossless,” meaning that no data is lost during thecompression process. A compressed media signal, after being received anddecompressed by a lossless codec, is identical to the original. However,most commercially-available codecs are “lossy” and result in somedegradation of the original media signal.

For lossy codecs, compression “quality” (i.e., how similar a compressedmedia signal is to the original after decompression) variessubstantially from codec to codec, and may depend, for instance, on theamount of available bandwidth, the quality of the communication line,characteristics of the media signal, etc. Another compression metric,i.e., performance, relates to the amount of bandwidth required totransmit the compressed signal as opposed to the original signal.Typically, lossy codecs result in better performance than losslesscodecs, which is why they are preferred in most applications.

Codec designers generally attempt to fashion codecs that produce highquality compressed output across a wide range of operating parameters.Although some codecs, such as MPEG-2, have gained widespread acceptancebecause of their general usefulness, no codec is ideally suited to allpurposes. Each codec has individual strengths and weaknesses.

Conventionally, the same codec is used to compress and decompress amedia signal during the entire communication session or uniformly acrossa storage medium (e.g., DVD). However, a media signal is not a staticquantity. A video signal, for example, may change substantially fromscene to scene. Likewise, the available bandwidth or line quality maychange during the course of a communication. Selecting the wrong codecat the outset can be a costly mistake in terms of the bandwidth requiredto transmit or store the media signal.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a conventional communication system usingdata compression;

FIG. 2 is a block diagram of a communication system using multiplecodecs for compressing portions of a media signal according to anembodiment of the invention;

FIG. 3 is a detailed block diagram of a source system according to afirst embodiment of the invention;

FIG. 4 is a detailed block diagram of a source system according to asecond embodiment of the invention;

FIG. 5 is a detailed block diagram of a selection module;

FIG. 6 is a data flow diagram of a process for automatically selecting acodec;

FIG. 7 is a detailed block diagram of an artificial intelligence system;

FIG. 8 is a data flow diagram of a process for automatically selectingsettings for a codec;

FIG. 9 is a block diagram of a comparison module showing theintroduction of a licensing cost factor;

FIG. 10 is a block diagram of a process for modifying a target datarate; and

FIG. 11 shows a video frame subdivided into a plurality of sub-frames.

DETAILED DESCRIPTION

Reference is now made to the figures in which like reference numeralsrefer to like elements. For clarity, the first digit of a referencenumeral indicates the figure number in which the corresponding elementis first used.

In the following description, numerous specific details of programming,software modules, user selections, network transactions, databasequeries, database structures, etc., are provided for a thoroughunderstanding of the embodiments of the invention. However, thoseskilled in the art will recognize that the invention can be practicedwithout one or more of the specific details, or with other methods,components, materials, etc.

In some cases, well-known structures, materials, or operations are notshown or described in detail in order to avoid obscuring aspects of theinvention. Furthermore, the described features, structures, orcharacteristics may be combined in any suitable manner in one or moreembodiments.

FIG. 1 is a block diagram of a conventional system 100 for communicatingmedia signals from a source system 102 to a destination system 104. Thesource and destination systems 102, 104 may be variously embodied, forexample, as personal computers (PCs), cable or satellite set-top boxes(STBs), or video-enabled portable devices, such as personal digitalassistants (PDAs) or cellular telephones.

Within the source system 102, a video camera 106 or other devicecaptures an original media signal 108. A codec (compressor/decompressor)110 processes the original media signal 108 to create a compressed mediasignal 112, which may be delivered to the destination system 104 via anetwork 114, such as a local area network (LAN) or the Internet.Alternatively, the compressed media signal 112 could be written to astorage medium, such as a CD, DVD, flash memory device, or the like.

At the destination system 104, the same codec 110 processes thecompressed media signal 112 received through the network 114 to generatea decompressed media signal 116. The destination system 104 thenpresents the decompressed media signal 116 on a display device 118, suchas a television or computer monitor.

Conventionally, the source system 102 uses a single codec 110 to processthe entire media signal 108 during a communication session or for aparticular storage medium. However, as noted above, a media signal isnot a static quantity. Video signals may change substantially from sceneto scene. A single codec, which may function well under certainconditions, may not fare so well under different conditions. Changes inavailable bandwidth, line conditions, or characteristics of the mediasignal, itself, may drastically change the compression quality to thepoint that a different codec may do much better. In certain cases, acontent developer may be able to manually specify a change of codec 110within a media signal 108 where, for instance, the content developerknows that one codec 110 may be superior to another codec 110. However,this requires significant human effort and cannot be performed in realtime.

FIG. 2 is a block diagram of an alternative system 200 for communicatingmedia signals from a source system 202 to a destination system 204according to an embodiment of the present invention. As before, thesource system 202 receives an original media signal 108 captured by avideo camera 106 or other suitable device.

However, unlike the system 100 of FIG. 1, the depicted system 200 is notlimited to using a single codec 110 during a communication session orfor a particular storage medium. Rather, as described in greater detailbelow, each scene 206 or segment of the original media signal 108 may becompressed using one of a plurality of codecs 110. A scene 206 mayinclude one or more frames of the original media signal 108. In the caseof video signals, a frame refers to a single image in a sequence ofimages. More generally, however, a frame refers to a packet ofinformation used for communication.

As used herein, a scene 206 may correspond to a fixed segment of themedia signal 108, e.g., two seconds of audio/video or a fixed number offrames. In other embodiments, however, a scene 206 may be defined bycharacteristics of the original media signal 108, i.e., a scene 206 mayinclude two or more frames sharing similar characteristics. When one ormore characteristics of the original media signal 108 changes beyond apreset threshold, the source system 202 may detect the beginning of anew scene 206. Thus, while the video camera 106 focuses on a staticobject, a scene 206 may last until the camera 106, the object, or bothare moved.

As illustrated, two adjacent scenes 206 within the same media signal 108may be compressed using different codecs 110. The codecs 110 may be ofthe same general type, e.g., discrete cosine transform (DCT), or ofdifferent types. For example, one codec 110 a may be a DCT codec, whileanother codec 110 b is a fractal codec, and yet another codec 110 c is awavelet codec.

Unlike conventional systems 100, the system 200 of FIG. 2 automaticallyselects, from the available codecs 110, a particular codec 110 bestsuited to compressing each scene 206. Details of the selection processare described in greater detail below. Briefly, however, the system 200“remembers” which codecs 110 are used for scenes 206 having particularcharacteristics. If a subsequent scene 206 is determined to have thesame characteristics, the same codec 110 is used. However, if a scene206 is found to have substantially different characteristics from thosepreviously observed, the system 200 tests various codecs 110 on thescene 206 and selects the codec 110 producing the highest compressionquality (i.e., how similar the compressed media signal 210 is to theoriginal signal 108 after decompression) for a particular target datarate.

In addition, the source system 202 reports to the destination system 204which codec 110 was used to compress each scene 206. As illustrated,this may be accomplished by associating codec identifiers 208 with eachscene 206 in the resulting compressed media signal 210. The codecidentifiers 208 may precede each scene 206, as shown, or could be sentas a block at some point during the transmission. The precise format ofthe codec identifiers 208 is not crucial to the invention and may beimplemented using standard data structures known to those of skill inthe art.

The destination system 204 uses the codec identifiers 208 to select theappropriate codecs 110 for decompressing the respective scenes 206. Theresulting decompressed media signal 116 may then be presented on thedisplay device 118, as previously described.

FIG. 3 illustrates additional details of the source system 202. In oneembodiment, an input module 302 receives the original media signal 108from the video camera 106 or other source device. An identificationmodule 304 divides the original media signal 108 into scenes 206 andidentifies various characteristics (not shown) of each scene 206, asdescribed in greater detail below.

Thereafter, for each scene 206, a selection module 306 uses thecharacteristics (or the scene 206 itself) to select the optimal codec110 from a codec library 308. As used herein, “optimal” means producingthe highest compression quality for the compressed media signal 210 at aparticular target data rate (among those codecs 110 within the codeclibrary 308).

In one embodiment, a user may specify a particular target data rate,i.e., 128 kilobits per second (kbps). Alternatively, the target datarate may be determined by the available bandwidth or in light of otherconstraints.

The codec library 308 may include a wide variety of codecs 110. Examplesof possible video codecs 110 are provided in the following table. Inaddition, various audio-only codecs may be provided, such as MPEG AudioLayer 3 (MP3), MPEG-4 Structured Audio (MP4-SA), CCITT u-Law, OggVorbis, and AC3. Of course, other presently-available oryet-to-be-developed codecs 110 may be used within the scope of theinvention.

TABLE 1 FOURCC Name Owner FOURCC Name Owner 3IV1 3ivx 3IVX MPG4 MPEG-4Microsoft 3IV2 3ivx 3IVX MPGI MPEG Sigma Designs AASC Autodesk AnimatorAutodesk MRCA Mrcodec FAST codec Multimedia ADV1 WaveCodec Loronix MRLEMicrosoft RLE Microsoft ADVJ Avid M-JPEG Avid Technology MSVC MicrosoftMicrosoft Video 1 AEMI Array VideoONE Array MSZH AVImszh Kenji OshimaMPEG1-I Capture Microsystems AFLI Autodesk Animator Autodesk MTX1 Matroxcodec through MTX9 AFLC Autodesk Animator Autodesk MV12 codec AMPG ArrayVideoONE Array MWV1 Aware Motion Aware Inc. MPEG Microsystems WaveletsANIM RDX Intel nAVI AP41 AngelPotion AngelPotion NTN1 Video NogatechDefinitive Compression 1 ASV1 Asus Video Asus NVDS NVidia NVidia TextureFormat ASV2 Asus Video (2) Asus NVHS NVidia NVidia Texture Format ASVXAsus Video 2.0 Asus NHVU NVidia NVidia Texture Format AUR2 Aura 2Codec-YUV Auravision NVS0-NVS5 NVidia 422 AURA Aura 1 Codec-YUVAuravision NVT0-NVT5 NVidia 411 AVRn Avid M-JPEG Avid Technology PDVCDVC codec I-O Data Device, Inc. BINK Bink Video RAD Game Tools PGVVRadius Video Radius Vision BT20 Prosumer Video Conexant PHMO PhotomotionIBM BTCV Composite Video Conexant PIM1 Pegasus Imaging Codec BW10Broadway MPEG Data Translation PIM2 Pegasus Imaging Capture/CompressionCC12 YUV12 Codec Intel PIMJ Lossless JPEG Pegasus Imaging CDVC CanopusDV Codec Canopus PIXL Video XL Pinnacle Systems CFCC DPS PerceptionDigital Processing PVEZ PowerEZ Horizons Systems Technology CGDICamcorder Video Microsoft PVMM PacketVideo PacketVideo CorporationCorporation MPEG-4 CHAM Caviara Champagne Winnov PVW2 Pegasus PegasusImaging Wavelet Compression CMYK Uncompressed Colorgraph qpeq QPEG 1.1Q-Team CMYK CJPG WebCam JPEG Creative Labs QPEG QPEG Q-Team CPLA YUV4:2:0 Weitek raw Raw RGB CRAM Microsoft Video 1 Microsoft RGBT 32 bitsupport Computer Concepts CVID Cinepak Providenza & RLE Run LengthMicrosoft Boekelheide Encoder CWLT Color WLT DIB Microsoft RLE4 4bpp RunMicrosoft Length Encoder CYUV Creative YUV Creative Labs RLE8 8bpp RunMicrosoft Length Encoder CYUY ATI Technologies RMP4 MPEG-4 AS SigmaDesigns Profile Codec D261 H.261 DEC RT21 Real Time Intel Video 2.1 D263H.263 DEC rv20 RealVideo G2 Real DIV3 DivX MPEG-4 DivX rv30 RealVideo 8Real DIV4 DivX MPEG-4 DivX RVX RDX Intel DIV5 DivX MPEG-4 DivX s422VideoCap Tekram C210 International YUV Codec DIVX DivX OpenDivX SAN3DivX 3 divx DivX SDCC Digital Camera Sun Codec Communications DMB1Rainbow Runner Matrox SEDG Samsung Samsung hardware MPEG-4 compressionDMB2 Rainbow Runner Matrox SFMC Surface Fitting CrystalNet hardwareMethod compression DSVD DV Codec SMSC Proprietary Radius codec DUCKTrueMotion S Duck Corporation SMSD Proprietary Radius codec dv25 DVCPROMatrox smsv Wavelet Video WorldConnect (corporate site) dv50 DVCPRO50Matrox SP54 SunPlus dvsd Pinnacle Systems SPIG Spigot Radius DVE2 DVE-2InSoft SQZ2 VXTreme Microsoft Videoconferencing Video Codec Codec V2DVX1 DVX1000SP Video Lucent SV10 Video R1 Sorenson Media Decoder DVX2DVX2000S Video Lucent STVA ST CMOS ST Decoder Imager DataMicroelectronics DVX3 DVX3000S Video Lucent STVB ST CMOS ST DecoderImager Data Microelectronics DX50 DivX MPEG-4 DivX STVC ST CMOS STversion 5 Imager Data Microelectronics (Bunched) DXTn DirectX CompressedMicrosoft STVX ST CMOS ST Texture Imager Data Microelectronics DXTCDirectX Texture Microsoft STVY ST CMOS ST Compression Imager DataMicroelectronics ELK0 Elsa Quick Codec Elsa SVQ1 Sorenson Sorenson MediaVideo EKQ0 Elsa Quick Codec Elsa TLMS Motion TeraLogic Intraframe CodecESCP Escape Eidos Technologies TLST Motion TeraLogic Intraframe CodecETV1 eTreppid Video eTreppid TM20 TrueMotion Duck Codec Technologies 2.0Corporation ETV2 eTreppid Video eTreppid TM2X TrueMotion Duck CodecTechnologies 2X Corporation ETVC eTreppid Video eTreppid TMIC MotionTeraLogic Codec Technologies Intraframe Codec FLJP Field EncodedD-Vision TMOT TrueMotion S Horizons Motion JPEG Technology FRWA ForwardMotion SoftLab-Nsk TR20 TrueMotion Duck JPEG with alpha RT 2.0Corporation channel FRWD Forward Motion SoftLab-Nsk TSCC TechSmithTechsmith Corp. JPEG Screen Capture Codec FVF1 Fractal Video FrameIterated Systems TV10 Tecomac Low- Tecomac, Inc. Bit Rate Codec GLZWMotion LZW gabest@freemail.hu TVJP Pinnacle/Truevision GPEG Motion JPEGgabest@freemail.hu TVMJ Pinnacle/Truevision GWLT Greyscale WLT DIBMicrosoft TY2C Trident Trident Decompression Microsystems H260 ITU H.26nIntel TY2N Trident through Microsystems H269 HFYU Huffman Lossless TY0NTrident Codec Microsystems HMCR Rendition Motion Rendition UCODClearVideo eMajix.com Compensation Format HMRR Rendition MotionRendition ULTI Ultimotion IBM Corp. Compensation Format i263 ITU H.263Intel V261 Lucent Lucent VX2000S IAN Indeo 4 Codec Intel V655 YUV 4:2:2Vitec Multimedia ICLB CellB InSoft VCR1 ATI Video ATI VideoconferencingCodec 1 Technologies Codec IGOR Power DVD VCR2 ATI Video ATI Codec 2Technologies IJPG Intergraph JPEG Intergraph VCR3-9 ATI Video ATI CodecsTechnologies ILVC Layered Video Intel VDCT VideoMaker Vitec MultimediaPro DIB ILVR ITU H.263+ Codec VDOM VDOWave VDONet IPDV Giga AVI DV CodecI-O Data Device, VDOW VDOLive VDONet Inc. IR21 Indeo 2.1 Intel VDTZVideoTizer Darim Vision Co. YUV Codec IRAW Intel Uncompressed Intel VGPXVideoGramPix Alaris UYUV IV30 Indeo 3 Ligos VIFP VFAPI Codec throughIV39 IV32 Indeo 3.2 Ligos VIDS Vitec Multimedia IV40 Indeo InteractiveLigos VIVO Vivo H.263 Vivo Software through IV49 IV50 Indeo InteractiveLigos VIXL Video XL Pinnacle Systems JBYR Kensington VLV1 VideoLogicJPEG JPEG Still Image Microsoft VP30 VP3 On2 JPGL JPEG Light VP31 VP3On2 L261 Lead H.26 Lead Technologies vssv VSS Video Vanguard SoftwareSolutions L263 Lead H.263 Lead Technologies VX1K VX1000S Lucent VideoCodec LCMW Motion CMW Codec Lead Technologies VX2K VX2000S Lucent VideoCodec LEAD LEAD Video Codec Lead Technologies VXSP VX1000SP Lucent VideoCodec LGRY Grayscale Image Lead Technologies VYU9 ATI YUV ATITechnologies Ljpg LEAD MJPEG Lead Technologies VYUY ATI YUV ATI CodecTechnologies LZO1 Lempel-Ziv- Markus Oberhumer WBVC W9960 WinbondOberhumer Codec Electronics M263 H.263 Microsoft WHAM MicrosoftMicrosoft Video 1 M261 H.261 Microsoft WINX Winnov Winnov SoftwareCompression M452 MPEG-4 Microsoft WJPG Winbond (automatic WMP JPEGdownload) MC12 Motion ATI Technologies WNV1 Winnov Winnov CompensationHardware Format Compression MCAM Motion ATI Technologies x263 XirlinkCompensation Format MJ2C Motion JPEG 2000 Morgan XVID XVID MPEG-4 XVIDMultimedia mJPG Motion JPEG IBM XLV0 XL Video NetXL Inc. includingHuffman Decoder Tables MJPG Motion JPEG XMPG XING MPEG XING CorporationMMES MPEG-2 ES Matrox XWV0-XWV9 XiWave Video XiWave Codec MP2A Evaldownload Media Excel XXAN Origin MP2T Eval download Media Excel Y411 YUV4:1:1 Microsoft MP2V Eval download Media Excel Y41P Brooktree ConexantYUV 4:1:1 MP42 MPEG-4 Microsoft Y8 Grayscale (automatic WMP videodownload) MP43 MPEG-4 Microsoft YC12 YUV 12 codec Intel (automatic WMPdownload) MP4A Eval download Media Excel YUV8 Caviar YUV8 Winnov MP4SMPEG-4 Microsoft YUY2 Raw, Microsoft (automatic WMP uncompresseddownload) YUV 4:2:2 MP4T Eval download Media Excel YUYV Canopus MP4VEval download Media Excel ZLIB MPEG MPEG ZPEG Video Zipper Metheus MPG4MPEG-4 Microsoft ZyGo ZyGoVideo ZyGo Digital (automatic WMP download)

Those of skill in the art will recognize that many of theabove-described codecs may be deemed “generalist” codecs in that theyachieve a high compression quality for a wide variety of media signalsand conditions. However, other codecs may be deemed “specialist” codecsbecause they compress certain types of media signals well or compressmany types of media signals well under certain conditions. Providing acodec library 308 that includes a variety of both generalist andspecialist codecs, including codecs of different families, typicallyresults in the best overall compression quality for a compressed mediasignal 210.

Referring again to FIG. 3, after a codec 110 is selected for a scene206, a compression module 310 compresses the scene 206 using theselected codec 110. An output module 312 receives the resultingcompressed media signal 210 and, in one embodiment, adds codecidentifiers 208 to indicate which codecs 110 were used to compress eachscene 206. In other embodiments, the codec identifiers 208 may be addedby the compression module 310 or at other points in the compressionprocess. The output module 312 then delivers the compressed media signal210 to the destination system 204 via the network 114.

The embodiment of FIG. 3 is primarily applicable to streaming mediaapplications, including video conferencing. In an alternativeembodiment, as depicted in FIG. 4, the output module 312 may be coupledto a storage device 402, such as CD or DVD recorder, flash card writer,or the like. As depicted, the compressed media signal 210 (and codecidentifiers 208) may be stored on an appropriate storage medium 404,which is physically delivered to the destination system 204. In such anembodiment, the destination system 204 would include a media reader (notshown) for reading the compressed media signal 210 from the storagemedium 404.

Unlike conventional media compression techniques, the original mediasignal 108 is not compressed using a single codec (i.e., MPEG-2 as inDVDs). Rather, each scene 206 is automatically compressed using the bestcodec 110 selected from a codec library 308 for that scene 206. Usingthe above-described technique, between 10 to 12 hours of DVD-qualityvideo may be stored on a single recordable DVD.

FIG. 5 illustrates additional details of the selection module 306. Asnoted above, the identification module 304 receives the original mediasignal 108 and identifies individual scenes 206, as well ascharacteristics 502 of each scene 206. The characteristics 502 mayinclude, for instance, motion characteristics, color characteristics,YUV signal characteristics, color grouping characteristics, colordithering characteristics, color shifting characteristics, lightingcharacteristics, and contrast characteristics. Those of skill in the artwill recognize that a wide variety of other characteristics of a scene206 may be identified within the scope of the invention.

Motion is composed of vectors resulting from object detection. Relevantmotion characteristics may include, for example, the number of objects,the size of the objects, the speed of the objects, and the direction ofmotion of the objects.

With respect to color, each pixel typically has a range of values forred, green, blue, and intensity. Relevant color characteristics mayinclude how the ranges of values change through the frame set, whethersome colors occur more frequently than other colors (selection), whethersome color groupings shift within the frame set, whether differencesbetween one grouping and another vary greatly across the frame set(contrast).

In one embodiment, an artificial intelligence (AI) system 504, such as aneural network or expert system, receives the characteristics 502 of thescene 206, as well as a target data rate 506 for the compressed mediasignal 210. The AI system 504 then determines whether a codec 110 existsin the library 308 that has previously been found to optimally compressa scene 206 with the given characteristics 502 at the target data rate506. As explained below, the AI system 504 may be conceptualized as“storing” associations between sets of characteristics 502 and optimalcodecs 110. If an association is found, the selection module 306 outputsthe codec 110 (or an indication thereof) as the “selected” codec 110.

In many cases, a scene 206 having the specified characteristics 502 maynot have been previously encountered. Accordingly, the selection module306 makes a copy of the scene 206, referred to herein as a baselinesnapshot 508, which serves as a reference point for determiningcompression quality.

Thereafter, a compression module 510 tests different codecs 110 from thecodec library 308 on the scene 206. In one embodiment, the compressionmodule 510 is also the compression module 310 of FIG. 3. As depicted,the compression module 510 compresses the scene 206 using differentcodecs 110 at the target data rate 506 to produce multiple compressedtest scenes 512.

The codecs 110 may be tested sequentially, at random, or in other ways,and all of the codecs 110 in the library need not be tested. In oneembodiment, input from the AI system 504 may assist with selecting asubset of the codecs 110 from the library 308 for testing. In somecases, a time limit may be imposed for codec testing in order tofacilitate real-time compression. Thus, when the time limit is reached,no additional compressed test scenes 512 are generated.

In one embodiment, a comparison module 514 compares the compressionquality of each compressed test scene 512 with the baseline snapshot 508according to a set of criteria 516. The criteria 516 may be based on acomparison of Peak Signal to Noise Ratios (PSNRs), which may becalculated, for an M×N frame, by:

$\begin{matrix}{{PSNR} = {20 \times {\log_{10}\left( \frac{255}{\sqrt{\frac{1}{M \times N}{\sum\limits_{m = 0}^{M - 1}{\sum\limits_{n = 0}^{N - 1}\left\lbrack {{f^{\prime}\left( {m,n} \right)} - {f\left( {m,n} \right)}} \right\rbrack^{2}}}}} \right)}}} & {{Eq}.\mspace{14mu} 1}\end{matrix}$

where f is the original frame and f′ is the uncompressed frame.

Alternatively, Root Mean Square Error (RMSE), Signal to Noise Ratio(SNR), or other objective quality metrics may be used as known to thoseof skill in the art.

In certain embodiments, a Just Noticeable Difference (JND) image qualitymetric calculation may be used. JND is a robust objective picturequality measurement method known to those skilled in the art. Itincludes three dimensions for evaluation of dynamic and complex motionsequences-spatial analysis, temporal analysis and full color analysis.By using a model of the human visual system in a picture differencingprocess, JND produces results that are independent of the compressionprocess and resulting artifacts.

In one embodiment, the comparison module 514 automatically selects thecodec 110 used to generate the compressed scene 512 that has the highestcompression quality when compared to the baseline snapshot 508 accordingto the set of criteria 516. That codec 110 (or an indication thereof) isthen output by the selection module 306 as the selected codec 110.

The comparison module 514 tells the AI system 504 which codec 110 wasselected for the scene 206. This allows the AI system 504 to make anassociation between the identified characteristics 502 of the scene 206and the selected codec 110. Thus, in the future, the AI system 504 mayautomatically select the codec 110 for a similar scene 206 without theneed for retesting by the comparison module 514.

Referring also to FIG. 3, in one configuration, the highest-qualitycompressed test scene 512 a is simply passed to the output module 312(not shown) to be included in the compressed media signal 210. However,the compression module 310 could recompress the scene 206 using theselected codec 110 in certain embodiments.

FIG. 6 provides an example of the above-described process. Suppose thatthe identification module 304 finds a scene 206 a having a particularset of characteristics 502 a. In one embodiment, the AI system 504searches an association 602 between the characteristics 502 a and aparticular codec 110. While the AI system 504 is depicted as includingcharacteristics 502, associations 602, and codecs 110, those skilled inthe art will recognize that these entities may be represented by codes,hashes, or other identifiers in various implementations.

Assuming that no such association 602 is found, a baseline snapshot 508of the scene 206 a is taken. In addition, the compression module 510compresses the scene 206 a at the target data rate 506 using a number ofdifferent codecs 110 a-c from the codec library 308 to create aplurality of compressed test scenes 512 a-c. These test scenes 512 a-care then compared against the baseline snapshot 508 according to a setof criteria 516, e.g., PSNR.

Suppose that the compressed test scene 512 a produced by one codec 110 a(“Codec 1”) results in the highest compression quality, e.g., thehighest PSNR. In such a case, the comparison module 514 would inform theAI system 504 so that an association 602 could be made between thecharacteristics 502 a of the scene 206 a and the selected codec 110 a.Thus, if a scene 206 having the same characteristics 502 a isencountered in the future, the AI system 504 could simply identify theoptimal codec 110 a without the need for retesting.

Referring to FIG. 7, the AI system 504 may be implemented using atypical feedforward neural network 700 comprising a plurality ofartificial neurons 702. A neuron 702 receives a number of inputs (eitherfrom original data, or from the output of other neurons in the neuralnetwork 700). Each input comes via a connection that has a strength (or“weight”); these weights correspond to synaptic efficacy in a biologicalneuron. Each neuron 702 also has a single threshold value. The weightedsum of the inputs is formed, and the threshold subtracted, to composethe “activation” of the neuron 702 (also known as the post-synapticpotential, or PSP, of the neuron 702). The activation signal is passedthrough an activation function (also known as a transfer function) toproduce the output of the neuron 702.

As illustrated, a typical neural network 700 has neurons 702 arranged ina distinct layered topology. The “input” layer 704 is not composed ofneurons 702, per se. These units simply serve to introduce the values ofthe input variables (i.e., the scene characteristics 502). Neurons 702in the hidden 706 and output 708 layers are each connected to all of theunits in the preceding layer.

When the network 700 is executed, the input variable values are placedin the input units, and then the hidden and output layer units areprogressively executed. Each of them calculates its activation value bytaking the weighted sum of the outputs of the units in the precedinglayer, and subtracting the threshold. The activation value is passedthrough the activation function to produce the output of the neuron 702.When the entire neural network 700 has been executed, the outputs of theoutput layer 708 act as the output of the entire network 700 (i.e., theselected codec 110).

While a feedforward neural network 700 is depicted in FIG. 7, those ofskill in the art will recognize that other types of neural networks 700may be used, such as feedback networks, Back-Propagated Delta RuleNetworks (BP) and Radial Basis Function Networks (RBF). In otherembodiments, an entirely different type of AI system 504 may be used,such as an expert system.

In still other embodiments, the AI system 504 may be replaced by lookuptables, databases, or other data structures that are capable ofsearching for a codec 110 based on a specified set of characteristics502. Thus, the invention should not be construed as requiring an AIsystem 504.

Referring to FIG. 8, the invention is not limited to embodiments inwhich different codecs 110 are used to respectively encode differentscenes 206 of an original media signal 108. As illustrated, a singlecodec 110 may be used in one embodiment. However, different settings 804(parameters) for the codec 110 may be automatically selected in much thesame way that different codecs 110 were selected in the precedingembodiments.

As used herein, codec settings 804 refer to standard parameters such asthe motion estimation method, the GOP size (keyframe interval), types oftransforms (e.g., DCT vs. wavelet), noise reduction for luminance orchrominance, decoder deblocking level, preprocessing/postprocessingfilters (such as sharpening and denoising), etc.

As before, suppose that the identification module 304 finds a scene 206a having a given set of characteristics 502 a. In one embodiment, the AIsystem 504 searches an association 802 between the characteristics 502 aand one or more settings 804 a for the codec 110.

Assume that no such association 802 is found. In one configuration, abaseline snapshot 508 of the scene 206 a is taken. In addition, thecompression module 510 compresses the scene 206 a at the target datarate 506 using the same codec 110 but with different settings 804 a-c.The resulting compressed test scenes 512 a-c are then compared againstthe baseline snapshot 508 according to a set of criteria 516, e.g.,PSNR.

Suppose that the compressed test scene 512 a produced by one group ofsettings 804 a (“Settings 1”) results in the highest compressionquality, e.g., the highest PSNR. In such a case, the comparison module514 would inform the AI system 504, so that an association 802 could bemade between the characteristics 502 a of the scene 206 a and theselected group of settings 804 a. Accordingly, if a scene 206 having thesame characteristics 502 a is encountered in the future, the AI system504 could simply identify the optimal settings 804 a without the needfor retesting.

In still other embodiments, the AI system 504 may search for bothdifferent codecs 110 and different codec settings 804 based on a givenset of characteristics 502. Likewise, the compression module 510 maygenerate compressed test scenes 512 based on combinations of differentcodecs 110 and different settings 804. The comparison module 514 maythen select the best combination of codec 110 and settings 804 for agiven scene 206.

In one embodiment, as shown in FIG. 9, the comparison module 514 mayconsider other factors in addition to (or in lieu of) compressionquality in determining which codec 110 and/or settings 804 toautomatically select for a particular scene 206. For instance, the useof certain codecs 110 may incur licensing costs 902 based on patents orother intellectual property rights. The licensing costs 902 may be tiedto the number of times the codec 110 is used, the amount of datacompressed using the codec 110, or in other ways.

While one codec 110 may provide an exceptionally high compressionquality (e.g., PSNR), its licensing cost 902 may exceed the value of thetransmission and would not be cost justified. Indications of thelicensing costs 902 for various codecs 110 may be stored within thecodec library 308 or at other locations accessible by the comparisonmodule 514.

In one embodiment, the licensing costs 902 are considered only when anumber of the top codecs 110 produce similar results, e.g., thecompression qualities differ by no more than a threshold amount. In theexample of FIG. 9, the first three codecs 110 produce output of similarquality. However, the codec 110 with the highest PSNR score is more thantwo times more expensive than the codec 110 with the next highest PSNRscore, which is, itself, almost three times more expensive than thecodec 110 with the third highest PSNR score. In one configuration, thecomparison module 510 would select the codec 110 with the third highestPSNR score due to its much lower licensing cost 902.

In other embodiments, the comparison module 514 may create a compositescore (not shown) based on the PSNR score, the licensing cost 902, andother possible factors. In still other embodiments, the comparisonmodule 514 may calculate an anticipated cost (not shown) for the entiretransmission and seek to minimize that cost over all of the codecselection decisions. Hence, the comparison module 514 might select amore expensive codec 110 for certain scenes 206, where a substantialincrease in quality is realized, while selecting less expensive codecs110 for other scenes.

Referring to FIG. 10, a user of the source system 202 may specify aparticular target data rate 506, e.g., 512 kbps, for videocommunication. However, there is no guarantee that the destinationsystem 204 may be able to process data that quickly. Moreover, there isno guarantee that the network 114 will always provide the same amount ofbandwidth. As a result, there may be a need to periodically change thetarget data rate 506 within the selection module 306 of the sourcesystem 202, since the target data rate 506 will affect which codecs 110are selected for various scenes 206.

For example, as shown in FIG. 10, the destination system 204 may beembodied as a video-enabled cellular telephone. Typically, the bandwidthover cellular networks 114 is limited. Similarly, the processing powerof a cellular telephone is substantially less than that of a personalcomputer or dedicated video conferencing system.

Thus, although the user of the source system 202 specifies a target datarate 506 of 512 kbps, the destination system 204 and/or network 114 maynot be up to the challenge. In one embodiment, in response to receivinga connection request, the destination system 204 provides the sourcesystem 202 with a modified target data rate 1002, e.g., 128 kpbs. Themodified rate 1002 may be communicated to the source system 202 usingany standard data structure or technique. Thereafter, depending on theconfiguration, the target data rate 506 may be replaced by the modifiedrate 1002.

In certain embodiments, an actual data rate is not communicated. Rather,a message is sent specifying one or more constraints or capabilities ofthe destination system 204 or network 114, in which case it would be upto the source system 202 to revise the target data rate 506 asappropriate. A technique of altering the target data rate 506 inresponse to various conditions is referred to herein as “dynamicstreaming.”

In one embodiment, dynamic streaming may be employed where no specificmessage is sent by destination system 204. The source system 202 may uselatency calculations, requests to resend lost packets, etc., todynamically determine the target data rate 506 for purposes of codecand/or parameter selection.

In one configuration, as shown in FIG. 11, video frames 1102 within ascene 206 may be subdivided into a plurality of sub-frames 1104. Whilethe depicted video frame 1102 is subdivided into four sub-frames 1104a-d of equal size, the invention is not limited in this respect. Forinstance, a video frame 1102 may be subdivided into any number ofsub-frames 1104, although too many sub-frames 1104 may adversely affectcompression quality. Moreover, the sub-frames 1104 need not be of equalsize. For example, sub-frames 1104 near the center of the video frame1102 may be smaller due to the relatively greater amount of motion inthis area.

In certain embodiments, the sub-frames 1104 may be defined by objectsrepresented within the video frame 1102. As an example, the head of aperson could be defined as a separate object and, hence, a differentsub-frame 104 from the background. Algorithms (e.g., MPEG-4) forobjectifying a scene within a video frame 1102 are known in the art.

A set of sub-frames 1104 a-d within a scene 206 exhibit characteristics502 a-d, and may be treated, for practical purposes, like a completevideo frame 1102. Accordingly, using the techniques described above, thecharacteristics 502 a-d may be used to determine an optimal codec 1104a-d for the compressing the respective sub-frames 1104 a-d. For example,an AI system 504 (not shown) may be used to determine whether anassociation 602 exists between a set of characteristics 502 and aparticular codec 110. If no association 602 exists, compression 510 andcomparison 514 modules (not shown) may be used to test a plurality ofcodecs 110 on the respective sub-frames 1104 to determine the optimalcodec 110.

Thus, different sub-frames 1104 a-d of a single scene 206 may becompressed using different codecs 110 a-d. In the illustratedembodiment, four different codecs 110 a-d are used.

While specific embodiments and applications of the present inventionhave been illustrated and described, it is to be understood that theinvention is not limited to the precise configuration and componentsdisclosed herein. Various modifications, changes, and variationsapparent to those of skill in the art may be made in the arrangement,operation, and details of the methods and systems of the presentinvention disclosed herein without departing from the spirit and scopeof the present invention.

1. A media compression method comprising: obtaining a media signal to becommunicated to a destination system; identifying a plurality of sceneswithin the media signal; automatically selecting different codecs from acodec library to respectively compress at least two of the scenes,wherein the codecs are automatically selected to produce a highestcompression quality for the respective scenes according to a set ofcriteria without exceeding a target data rate; compressing the scenesusing the automatically selected codecs; and delivering the compressedscenes to the destination system with an indication of which codec wasused to compress each scene.